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Abstract. We resolve an open question by determining matching (asymp- 
CO ' totic) upper and lower bounds on the state complexity of the operation 

Cn , that sends a language L to ( L* ) . 



in 



1 Introduction 



h-1 

Q ' Let 17 be a finite nonempty alphabet, let L C Z"* be a language, let L — U* — L 

denote the complement of L, and let L* (resp., L+) denote the Kleene closure 
(resp., positive closure) of the language L. If L is a regular language, its state 
^ , complexity is defined to be the number of states in the minimal deterministic 

CO ' finite automaton accepting L [7]. In this paper we resolve an open question by 

l*i , determining matching (asymptotic) upper and lower bounds on the deterministic 

state complexity of the operations 



m . L^ (L* 



To simplify the exposition, we will write everything using an exponent nota- 
tion, using c to represent complement, as follows: 



X 

H ^ 

C^ . L+^ := L+ 



and similarly for L*'^ and L*^* 
Note that 

r *c* 



1+"+ := (L+)+, 



1+"+ U {e}, ffeei. 



It follows that the state complexity of i+^+ and L*^* differ by at most 1. In 
what follows, we will work only with i+^+. 
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2 Upper Bound 



Consider a deterministic finite automaton (DFA) D — {Qn, ^, S, 0, F) accepting 
a language L, where Q„ := {0, 1, . . . , n — 1}. As an example, consider the three- 
state DFA over {a,b,c,d} shown in Fig. [T] (left) . To get a nondeterministic finite 
automaton (NFA) iVi for the language L~^ from the DFA D, we add an e- 
transition from every non-initial final state to the state 0. In our example, we 
add an e-transition from state 1 to state 0; see Fig. [1] (right). After applying 
the subset construction to the NFA TVi we get a DFA Di for the language L"*". 
The state set of Di consists of subsets of Qn see Fig. [2] (left). Here the sets in 
the labels of states are written without commas and brackets; thus, for example 
012 stands for the set {0, 1,2}. Next, we interchange the roles of the final and 
non-final states of the DFA Di, and get a DFA D2 for the language L"'"'^; see 
Fig. m (right). 

To get an NFA N3 for L+'^+ from the DFA D2, we add an e-transition from 
each non-initial final state of D2 to the state {0}, see Fig. [3] (top). Applying the 
subset construction to the NFA A^3 results in a DFA D3 for the language _L+'^+ 
with its state set consisting of some sets of subsets of Qn, see Fig. |3] (middle). 
Here, for example, the label 0, 2 corresponds to the set {{0}, {2}}. This gives an 
upper bound of 2^ on the state complexity of the operation plus-complement- 
plus. 

Our first result shows that in the minimal DFA for 27+*^+ we do not have 
any state {Si, S2, ■ ■ ■ ,Sk}, in which a set Si is a subset of some other set Sj; see 
Fig. [3] (bottom) . This reduces the upper bound to the number of antichains of 
subsets of an n-element set known as the Dedekind number M{n) with [2] 
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Fig. 1. DFA D for a language L and NFA A^'i for the language L"*". 
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Fig. 2. DFA Di for language L+ and DFA D2 for the language L^". 
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Fig. 3. NFA N-A, DFA D3, and the minimal DFA D^'" for the language L^ 



Lemma 1. If S andT are subsets ofQn such that 5 C T, then the states {S,T} 
and {S} of the DFA D^ for the language i+^+ are equivalent. 

Proof. Let S and T be subsets of Q„ such that S C T. We only need to show 
that if a string w is accepted by the NFA A^3 starting from the state T, then it 
also is accepted by N^ from the state S. 

Assume w is accepted by N^ from T. Then in the NFA N^, an accepting 
computation on w from state T looks like this: 

T ^ Ti A {0} A Ta, 

where w = uv, and state T goes to an accepting state Ti on u without using any 
£-transitions, then Ti goes to {0} on e, and then {0} goes to an accepting state 
T2 on v; it also may happen that w — u, in which case the computation ends in 
Ti. Let us show that S goes to an accepting state of the NFA N3 on u. 

Since T goes to an accepting state Ti on u in the NFA N3 without using 
any e-transition, state T goes to the accepting state Ti in the DFA D2, and 
therefore to the rejecting state Ti of the DFA Di. Thus, every state q in T goes 
to rejecting states in the NFA A^i. Since S CT, every state in S goes to rejecting 
states in the NFA A^i, and therefore S goes to a rejecting state 5*1 in the DFA 
-Di, thus to the accepting state Si in the DFA D2. Hence w = uv is accepted 
from S in the NFA N3 by computation 

5 4^1 A{0}At2. 

D 

Hence whenever a state S — {81,82, ■.■ ,Sk} of the DFA D3 contains two 
subsets 8i and 8j with i ^ j and 8i C 8j, then it is equivalet to state iS \ {S'j}. 
Using this property, we get the following result. 

Lemma 2. Let D be a DFA for a language L with state set Qn, and _D™'" be 
the minimal DFA for i+^+ as described above. Then every state of D™'" can be 
expressed in the form 

S^{Xi,X2,...,Xk} (1) 

where 

— 1 <k <n; 

— there exist subsets Si C 6*2 C ■ • ■ C 5fc C Q^; and 

— there exist qi, . . . , q^, pairwise distinct states of D not in 8k; such that 
~ Xi = {qi} U S'i /or i = 1, 2, . . . , fc. 

Proof. Let D ^ {Qn, S, 5, 0, F). 

For a state q in Qn and a symbol a in S, let q.a denote the state in Qn, 
to which q goes on a, that is, q.a = 5{q, a). For a subset X of Qn let X.a denote 
the set of states to which states in X go by a, that is, 

X.a^ \J{5(q,a)}. 
qex 



Consider transitions on a symbol a in automata D,Ni,Di,D2,N3; Fig. U] illus- 
trates these transitions. In the NFA A^i, each state q goes to a state in {0, q.a} if 
q.a is a final state of D, and to state q.a if q.a is non-final. It follows that in the 
DFA Di for L"*", each state X (a subset of Qn) goes on a to final state {0}L)X.a 
if X.a contains a final state of D, and to non-final state X.a if all states in X.a 
are non- final in D. Hence in the DFA D2 for L^'^, each state X goes on a to 
non-final state {0} U X.a if X.a contains a final state of D, and to the final state 
X.a if all states in X.a are non-final in D. 

Therefore, in the NFA N^ for _L+'^+, each state X goes on a to a state in 
{{0}, AT.a} if all states in X.a are non-final in D, and to state {0} U X.a if X.a 
contains a final state of D. 



To prove the lemma for each state, we use induction on the length of the 
shortest path from the initial state to the state of Z?™'" in question. The base 
case is a path of length 0. In this case, the initial state is {{0}}, which is in the 
required form ^ with k = l,qi = 0, and Si = ^. 
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X -> final (OJ U X.a if X.a contains a final state 
Y -> non-final Y.a if all .states in Y.a are non-final 
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X -> non-final (OJ U X.a if X.a contains afiinal state 

Y -> final Y.a if all the states in Y.a are non-final 



Fig. 4. Transitions under symbol a in automata D, TVi, Di, 1)2, A^s- 



For the induction step, let 

S ~ {Xi,X2, . . . ,Xk}, 
where 1 < k < n, and 

. 5i c ^2 c . . . c 5fc c g„, 

• qi, . . . ,qk are pairwise distinct states of D that are not in Sk and 

• X,^{q,}US^ fori = l,2,...,fc. 

We now prove the resuh for ah states reachable from 5 on a symbol a. 
First, consider the case that each Xi goes on a to a non- final state X^ in the 
NFA N3. It follows that S goes on a to 5' = {X{,X2, . . . , X^}, where 

Xl = {q,.a}US^.aU{0}. 

Write K = q^.a and P^ = S^.a U {0}. Then we have Pi C P2 ^ ■ • ■ ^ ^fc ^ Q«- 

li Pi = Pj for some i,j with i < j, then X^ C X', and therefore X' can be 
removed from state S' in the minimal DFA Z?™'". After several such removals, 
we arrive at an equivalent state 

5" = {xr,x^',...,xn 

where i < k, X^' — {ri} U Ri and the states ri, r2, . . . , r^ are pairwise distinct. 

If Ti 6 Ri for some i with i < £, then X^ C i?^; thus i?^ can be removed. 
After all such removals, we get an equivalent set 

Q/ii r yI/i y/ii v"'^ 

'-' — 1^1 J^2 '• ■ • '^m/ 

where m < £, X'l' — \ti\ U Ti and the states ti,t2, ■ ■ ■ ,tm are pairwise distinct 
and ti,t2, . . . , im-i are not in Tm- Htm ^ T'm, then the state S'" is in the required 
form ([!]). Otherwise, if T^-i is a proper subset of T^, then there is a state t in 
7m — Tm-i, and then we can take X^ = {t} U T^ — {i}: since ti, . . . , tm-i are 
not in Tm, they are distinct from t, and moreover Tm-i C T^, — {t}. 

If Tm-i = Tm, then Ar^'_i 3 X^', and therefore X'^_-y can be removed from 
S'" . After all these removals we either reach some Tj that is a proper subset of 
Tm-i and then pick a state t in T^ — T^ in the same way as above, or we only get 
a single set T,„, which is in the required form {r^ U Tm — {?",„}. 

This proves that if each Xi in S goes on a to a non-final state X[ in the 
NFA A^a, then S goes on a in the DFA D™'" to a set that is in the required form 

©■ 

Now consider the case that at least one Xj in S goes to a final state X': in 
the NFA X-j,. It follows that 5 goes to a final state 

where A' = {^j-a} U 5j.a and if i 7^ j, then A^' = {<Zi.a} U Si.a or A^' = 
{0} U {qi.a) U /Sj.a We now can remove all Xi that contain state 0, and arrive at 
an equivalent state 

5" = {{0},Ar,A^',...,A;}, 



where £ < k, and X-' = {pi} U Pi, and Pi C P2 ^ • • • ^ ^£ ^ Qn, and each pi is 
distinct from 0. 

Now in the same way as above we arrive at an equivalent state 

{{0},{ti}UTi,...,{i„}UT„,} 

where m < £, all the U are pairwise distinct and different from 0, and moreover, 
the states ti, . . . ,im-i are not in T^. If tm is not in Tm, then we are done. 
Otherwise, we remove all sets with Ti — T„i. We either arrive at a proper subset 
Tj of Tm, and may pick a state t in T™ — Tj to play the role of new tm, or we 
arrive at {{0},Tm}, which is in the required form {{0} U 0,im U Tm — {tm}}- 
This completes the proof of the lemma. D 

Corollary 1 (Star-Complement-Star: Upper Bound). // a language L is 
accepted by a DFA of n states, then the language L*'^* is accepted by a DFA of 

20(«logn) g^g^gg^ 

Proof. Lemma [5] gives the following upper bound 

f:(;;)fc!(fc+ir- 

since we first choose any permutation of k distinct elements qi, . . . ,qk, and then 
represent each set 5*^ as disjoint union of sets S'l, S'2, ■ ■ ■ ,3^ given by a function 
/ from Qn - {qi, . . . , gfc} to {1, 2, . . . , fc + 1} as follows: 

SI - {q I f{q) - «}, S, = S[iiS',ij ■■■ ii S[, 

while states with f{q) = fc + 1 will be outside each 5*^'; here U denotes a disjoint 
union. Next, we have 

^ (l') '^■(fc + I)""''' < "! Z] (?) (" + l)""*" <n\{n + 2)" = 20("i°s"), 
fe=i ^ ^ k=i ^ "^ 

and the upper bound follows. D 

Remark 1. The summation X]fc=i O'^'K'^ + l)"^*^ differs by one from Sloane's 
sequence A072597 [5]. These numbers are the coefhcients of the exponential 
generating function of l/(e~^ — x). It follows, by standard techniques, that these 
numbers are asymptotically given by CiVF(l)~"n!, where 

W{1) = .5671432904097838729999686622103555497538 

is the Lambert W- function evaluated at 1, equal to the positive real solution of 
the equation e^ = l/cc, and Ci is a constant, approximately 

1.12511909098678593170279439143182676599. 

The convergence is quite fast; this gives a somewhat more explicit version of the 
upper bound. 



Fig. 5. DFA D over {a, 6, c, d} with many reachable states in DFA Da for L'^'^ 



3 Lower Bound 



We now turn to the matching lower bound on the state complexity of plus- 
complement-plus. The basic idea is to create one DFA where the DFA for I/+°+ 
has many reachable states, and another where the DFA for L+'^+ has many 
distinguishable states. Then we "join" them together in Corollary [31 

The following lemma uses a four-letter alphabet to prove the reachability of 
some specific states of the DFA D^ for plus-complement-plus. 

Lemma 3. There exists an n-state DFA D — ((5„, {a, &, c, d}, 5, 0, {0, 1}) such 
that in the DFA Dg, for the language L{D)^'^^ every state of the form 

{{0,gi}U5i,{0,(Z2}U52,...,{0,(Zfc}U5fc} 



is reachable, where 1 < k < n — 2, Si, S2, ■ 
with Si C 5*2 C ••• C Sk, and the qi,. 
{2, 3, . . . , n — 2} that are not in Sk- 



, Sk are subsets of {2, 3, . . . , n — 2} 
, Qk are pairwise distinct states in 



Proof. Consider the DFA D over {a, b, c, d} shown in Fig. [5] Let L be the lan- 
guage accepted by the DFA D. 

Construct the NFA A^i for the language L+ from the DFA D by adding loops 
on a and d in the initial state 0. In the subset automaton corresponding to the 
NFA iVi, every subset of {0, 1, . . . , n — 2} containing state is reachable from 
the initial state {0} on a string over {a, b} since each subset {0, ii,i2, ■ ■ ■ , ik} of 
size k, where 1 < fc < n — 1 and 1 < ii < 12 < • • • < ifc < n — 2, is reached 
from the set {0,i2 —ii, ■ ■ ■ ,ik — *i} of size fc — 1 on the string ab^^^^. Moreover, 
after reading every symbol of string ab^^"^, the subset automaton is always in 
a set that contains state 0. All such states are rejecting in the DFA D2 for the 
language L^'^, and therefore, in the NFA N3 for L+''+, the initial state {0} only 
goes to the rejecting state {0, 41,12, • ■ • ,ik} on aV^~^. 

Hence in the DFA D3, for every subset S oi {0,1, . . . ,n — 2} containing 0, 
the initial state {{0}} goes to the state {S} on a string w over {a, b}. 

Now notice that transitions on symbols a and b perform the cyclic permuta- 
tion of states in {2, 3, . . . , n — 2}. For every state q in {2,3, . . . ,n — 2} and an 
integer i, let 

qei = {{q-i-2) mod n - 3) + 2 



denote the state in {2, 3, . . . , n — 2} that goes to the state q on string a% and, in 
fact, on every string over {a, 6} of length i. Next, for a subset 5 of {2, 3, ... , n— 2} 
let 

Sei^ {qei\qeS}. 

Thus 5 e i is a shift of 5, and if gr ^ 5, then qQi ^ S Qi. 

The proof of the lemma now proceeds by induction on fc. To prove the base 
case, let Si be a subset of {2, 3, . . . , n — 2} and qi be a state in {2, 3, . . . , n — 2} 
with Qi ^ Si. In the NFA N3, the initial state {0} goes to the state {0} U 5i on a 
string w over {a, b}. Next, state qi Q \w\ is in {2, 3, . . . , n — 2}, and it is reached 
from state 1 on a string b^, while state goes to itself on b. In the DFA D^ we 
thus have 

{{0}} A {{0, 1}} ^ {{0,qi e \w\}} A {{0,qi} U ^1}, 

which proves the base case. 

Now assume that every set of size A: — 1 satisfying the lemma is reachable in 
the DFA D3. Let 



S = 



{{0, qi} U Si, {0, 92} U 52, . . . , {0, qk} U Sk} 



be a set of size k satisfying the lemma. Let w be a string, on which {{0}} goes 
to {{0} U ^i}, and let £ be an integer such that 1 goes to qi Q \w\ on b^. Let 

s' = |{0,q2e|w|e^}u52e |w| e^, ...,{o,<7fce \w\ eejuSkQ \w\ e^j, 

where the operation is understood to have left-associativity. Then S' is reach- 
able by induction. On c, every set {0, qi Q \w\ Q £} Li Si Q \w\ Q £ goes to the 
accepting state {n — l,qiQ\w\Q£}LiSiQ\w\Q£mthe NFA N^ , and therefore also 
to the initial state {0}. Then, on d, every state {n — l,qiQ \w\ Q£}L)SiQ\w\o£ 
goes to the rejecting state {0,qiQ \w\Q£}LlSiQ \w\q£, while {0} goes to {0, 1}. 
Hence, in the DFA D^ we have 

s' A |{o}, {n - 1, <72 e |w| e £} u S2e\w\e £,..., {n~i,qke \w\ e£}uSke \w\ e £\ 
A {{0, 1}, {0, 92 e |z«| e ^} u 52 e |w| e ^, . . . , {o, qu e \w\ e ^} u 5^ e \w\ e ^} 
-^{{o,qie|w|},{o,q2e|w|}u52e|w|,...,{o,gfee|w|}u5fce|w|} A 5. 

It follows that S is reachable in the DFA D3. This concludes the proof. D 

The next lemma shows that some rejecting states of the DFA D3, in which 
no set is a subset of some other set, may be pairwise distinguishable. To prove 
the result it uses four symbols, one of which is the symbol b from the proof of 
the previuos lemma. 



e,f,g b,e,f.g 




Fig. 6. DFA D over {b, e, f, g} with many distinguishable states in DFA D3. 

Lemma 4. Let n > 5. There exists an n-state DFA D — {Qn, S,5,0,{0,1}) 
over a four-letter alphabet S such that all the states of the DFA Dg, for the 
language L{D)~^'^^ of the form 



{{0}UTi,{0}UT2,...,{0}UTfe}, 



in which no set is a subset of some other set and each Ti C {2, 3, 
are pairwise distinguishable. 



2}, 



Proof. To prove the lemma, we reuse the symbol b from the proof of Lemma |31 
and define three new symbols e, f, g as shown in Fig. [51 

Notice that on states 2, 3, . . . , n — 2, the symbol b performs a big permutation, 
while e performs a trasposition, and / a contraction. It follows that every trans- 
formation of states 2,3,...,n — 2 can be performed by strings over {6, e,/}. 
In particular, for each subset T of {2, 3, . . . , n — 2}, there is a string wt over 
{b, e, /} such that in D, each state in T goes to state 2 on wt, while each state 
in {2, 3, . . . , n — 2} \ T goes to state 3 on wt. Moreover, state remains in itself 
while reading the string wt. Next, the symbol g sends state to state 2, state 3 
to state 0, and state 2 to itself. 

It follows that in the NFA N3, the state {0} UT, as weh as each state {0} UT' 
with T' C T, goes to the accepting state {2} on wt ■ g. However, every other 
state {0} U T" with T" C {2, 3, . . . , n — 2} is in a state containig 0, thus in a 
rejecting state of N3, while reading wt ■ g, and it is in the rejecting state {0, 3} 
after reading wt. Then {0, 3} goes to the rejecting state {0, 2} on reading g. 

Hence the string wx-g is accepted by the NFA N3 from each state {0}UT' with 
T' C T, but rejected from any other state {0} U T" with T" C {2, 3, . . . , n - 2}. 

Now consider two different states of the DFA D3 

r={{o}uri,...,{o}urfe}, 
7^-{{o}ui^l,...,{o}ui?,}, 

in which no set is a subset of some other set and where each Ti and each Rj is a 
subset of {2, 3, . . . , n — 2}. Then, without loss of generality, there is a set {0} UT^ 
in T that is not in 7^. If no set {0} U T' with T' C T, is in 7^, then the string 
WTi ■ g is accepted from T but not from TZ. If there is a subset T' of Ti such that 
{0} U T' is in 7^, then for each suset T" of T' the set {0} U T" cannot be in T, 
and then the string wt' ■ g is accepted from TZ but not from T- □ 



10 



Corollary 2 (Star-Complement-Star: Lower Bound). There exists a lan- 
guage L accepted by an n-state DFA over a seven-letter input alphabet, such that 
any DFA for the language L*" has 2^^("i°g") states. 

Proof. Let S = {a, 6, c, d, e, /, g} and L be the language accepted by n-state 
DFA D — ({0, 1, . . . ,n — 1}, £",(5, 0, {0, 1}), where transitions on symbols a, 6,c, d 
are defined as in the proof of Lemma [3l and on symbols d, e, / as in the proof of 
Lemma m 

Let m = \n/2\. By Lemma [U the following states are reachable in the DFA 
£»3 for L+^+: 

{{0, 2} U 5i, {0, 3} U 52, . . . , {0, m - 2} U 5„_i}, 

where 5*1 C ^2 C • • • C Sm-i Q {rii — \,m, . . . ,n — 2}. The number of such 
subsets Si is given by m"^™, and we have 



(i) 



^ rjJ'2(nlogn) 



By Lemma m all these states are pairwise distinguishable, and the lower bound 
follows. D 

Hence we have an asymptotically tight bound on the state complexity of 
star-complement-star operation that is significantly smaller than 2^ . 

Theorem 1. The state complexity of star- complement- star is 2^("^°s")^ g 



4 Applications 

We conclude with an application. 

Corollary 3. Let L be a regular language, accepted by a DFA with n states. 
Then any language that can be expressed in terms of L and the operations of 
positive closure, Kleene closure, and complement has state complexity bounded 
by 2^("i°s"). 

Proof. As shown in pQ, every such language can be expressed, up to inclusion 
of £, as one of the following 5 languages and their complements: 

L,L+,L"+,L+'=+,L'=+"+. 

If the state complexity of L is n, then clearly the state complexity of L^ is also n. 
Furthermore, we know that the state complexity of L+ is bounded by 2" (a more 
exact bound can be found in [7]); this also handles L'^+. The remaining languages 
can be handled with Theorem [1] D 
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